This tutorial offers a step-by-step guide for how to create publication-ready figures using ggplot2 and the data from palmerpenguins.
# Install the package
remotes::install_github("allisonhorst/palmerpenguins")
# Load the package
library(palmerpenguins)
# Load the data into the Global Environment
data("penguins")
# View the data
head(penguins)
## # A tibble: 6 x 7
## species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g sex
## <fct> <fct> <dbl> <dbl> <int> <int> <fct>
## 1 Adelie Torge… 39.1 18.7 181 3750 male
## 2 Adelie Torge… 39.5 17.4 186 3800 fema…
## 3 Adelie Torge… 40.3 18 195 3250 fema…
## 4 Adelie Torge… NA NA NA NA <NA>
## 5 Adelie Torge… 36.7 19.3 193 3450 fema…
## 6 Adelie Torge… 39.3 20.6 190 3650 male
ggplot2First we will create a basic scatterplot of body_mass_g against bill_length_mm.
# Load the package
library(ggplot2)
ggplot(penguins, aes(body_mass_g, bill_length_mm))+ # this is the data
geom_point() # here we add the points
We can manually change the size of our datapoints. The points in the standard plot are quite small, so lets increase the size of the points with size = 3.
ggplot(penguins, aes(body_mass_g, bill_length_mm))+
geom_point(size = 3)
In ggplot2, it is possible to change the shape of the points. Here is a quick reference guide:
The shape of all datapoints can be changed with e.g. shape = 8.
ggplot(penguins, aes(body_mass_g, bill_length_mm))+
geom_point(size = 3, shape = 8)
Alternatively, we can change the shape of our points based on species with aes(shape = species).
ggplot(penguins, aes(body_mass_g, bill_length_mm))+
geom_point(aes(shape = species), size = 3)
You can also change the opacity of the data points using alpha. Alpha values are required to be between 0 - 1 where 0 is transparent and 1 is opaque.
ggplot(penguins, aes(body_mass_g, bill_length_mm))+
geom_point(aes(shape = species), size = 3, alpha = 0.6)
Now lets explore the different species by adding colour with the code colour = species.
ggplot(penguins, aes(body_mass_g, bill_length_mm))+
geom_point(aes(shape = species, colour = species), size = 3, alpha = 0.6)
This red-green colour combination is colourblind unfrieldly, so lets change the colour of the points with scale_colour_manual. To ensure the shapes match with the names we will also use scale_shape_manual.
ggplot(penguins, aes(body_mass_g, bill_length_mm))+
geom_point(aes(shape = species, colour = species), size = 3, alpha = 0.6)+
scale_colour_manual(values = c("#C15CCB", "#00868B", "#FF6A00"),
labels = c("Chinstrap", "Gentoo", "Adélie"))+
scale_shape_manual(values = c(17, 15, 16),
labels = c("Chinstrap", "Gentoo", "Adélie"))
We won’t change the points any more, so let’s save the plot as penguin_plot, so we can build upon it.
penguin_plot <- ggplot(penguins, aes(body_mass_g, bill_length_mm))+
geom_point(aes(shape = species, colour = species), size = 3, alpha = 0.6)+
scale_colour_manual(values = c("#C15CCB", "#00868B", "#FF6A00"),
labels = c("Chinstrap", "Gentoo", "Adélie"))+
scale_shape_manual(values = c(17, 15, 16),
labels = c("Chinstrap", "Gentoo", "Adélie"))
You can change the background of ggplot2 figures in a variety of ways with:
theme_gray()theme_bw()theme_linedraw()theme_light()theme_minimal()theme_classic()theme_void()theme_dark()My personal favourite is theme_bw therefore we will continue to make our plot with this theme.
We will further remove the thicker lines in the background with panel.grid.major = element_blank(), and the thinner lines with panel.grid.minor = element_blank().
penguin_plot +
theme_bw()+ # set the background theme
theme(panel.grid.major = element_blank(), # remove the major lines
panel.grid.minor = element_blank()) # remove the minor lines
There are several ways to change the size of the font, but we can quickly change all font size with theme_bw(base_size = 20).
penguin_plot <- penguin_plot +
theme_bw(base_size = 15)+
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank())
penguin_plot
You can rename the axes and legend using labs.
penguin_plot <- penguin_plot +
labs(x = "Body Mass (g)", y = "Bill Length (mm)", colour = "Species", shape = "Species")
penguin_plot
In the standard plots, the axes titles are really close to the plot. We will increase their distance with vjust:
penguin_plot <- penguin_plot +
theme(axis.title.y = element_text(vjust = 3))+ # increase distance from the y-axis
theme(axis.title.x = element_text(vjust = -1)) # increase distance from the x-axis
penguin_plot
Legends can be positioned in a number of different ways using legend.position:
theme(legend.position="top")theme(legend.position="bottom")theme(legend.position="left")theme(legend.position="right")theme(legend.position="none")Legend loction can be also a numeric vector c(x,y), where x and y are the coordinates of the legend box. Their values should be between 0 - 1. c(0,0) is the “bottom left” and c(1,1) is the “top right” position.
penguin_plot <- penguin_plot +
theme(legend.position = c(0.85, 0.2))
penguin_plot
Publications require high quality images and they specify the size and format required on their website.
ggsaveggsave is one example of how to save your figures.
It allows you to save high quality images in a variety of different file types (e.g. “png”, “eps”, “ps”, “tex”, “pdf”, “jpeg”, “tiff”, “png”, “bmp”, “svg”, “wmf”). You can also specify the width and height in “in”, “cm”, or “mm”, and specify the plot resolution with dpi.
# This will save the last plot for the code you ran:
ggsave("penguin_plot.pdf",
dpi = 600,
width = 100, height = 60, unit = "mm")
pdfAn alternative method is to use pdf. This allows you to specify the colour mode (e.g. cmyk).
pdf("ggplot-cmyk.pdf", width = 12 / 2.54, height = 8 / 2.54,
colormodel = "cmyk")
print(penguin_plot)
dev.off()
ggplot2 plots and tricksSize can be changed based on data within the penguins dataframe. For example, here we have changed the size of the points based on bill_depth_mm.
ggplot(penguins, aes(body_mass_g, bill_length_mm))+
geom_point(aes(size = bill_depth_mm))
Add 95% confidence interval ellispses with stat_ellipse.
ellipse<- penguin_plot +
stat_ellipse(aes(colour = species, level=0.95))
ellipse
Add a linear regression line with geom_smooth(method=lm).
lm <- penguin_plot +
geom_smooth(method=lm, aes(colour = species))
lm
## `geom_smooth()` using formula 'y ~ x'
facet_wrapfacet_wrap is a fantastic tool to slit plots based on a specified categorical column. Here we will use the species column.
penguin_plot +
facet_wrap(~species, ncol = 3, nrow = 1) + # specifying 3 columns, 1 row
theme(legend.position = "none") # remove legend
It is also possible to change the size, colour, face and fill colour of the facet strips with the following code:
penguin_plot +
facet_wrap(~species, ncol = 3, nrow = 1) + # specifying 3 columns, 1 row
theme(legend.position = "none")+ # remove legend
theme(strip.text.x = element_text(size = 16, color = "white", face = "bold"),
strip.background = element_rect(fill="black"))
We can also remove the free space using scales = free.
penguin_plot +
facet_wrap(~species, scales = "free")+
theme(legend.position = "none")+ # remove legend
theme(strip.text.x = element_text(size = 16, color = "white", face = "bold"),
strip.background = element_rect(fill="black"))
facet_grid: Free facet widthTo allow the facets to be different widths, we must use facet_grid and space = "free". This can be used with scales = "free".
penguin_plot +
facet_grid(.~species, scales = "free", space = "free") +
theme(legend.position = "none")+ # remove legend
theme(strip.text.x = element_text(size = 16, color = "white", face = "bold"),
strip.background = element_rect(fill="black"))
library(dplyr)
penguin_summary <- penguins %>%
group_by(species) %>%
summarise(mean = mean(body_mass_g, na.rm = T),
sd = sd(body_mass_g, na.rm = T))
ggplot(penguin_summary, aes(y = mean, x=species, fill = species)) +
geom_bar(stat="identity")+
geom_errorbar(aes(ymin = mean-sd, ymax = mean+sd),
width=.1)
Create the plot like the code above for the scatter plot.
bar <- ggplot(penguin_summary, aes(y = mean, x = species, fill = species)) +
geom_bar(stat = "identity")+
geom_errorbar(aes(ymin = mean-sd, ymax = mean+sd),
width=.1)+
theme_bw(base_size = 20)+
theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())+
labs(x = "Species", y = "Body Mass (g)")+
scale_fill_manual(values = c( "#FF6A00","#C15CCB", "#00868B"))+
theme(legend.position = "none") +
scale_y_continuous(limits = c(0, 5700))+
theme(axis.title.y = element_text(vjust = 3)) +
theme(axis.title.x = element_text(vjust = -1))
bar
ggplot(penguins, aes(x = flipper_length_mm, fill = species)) +
geom_histogram(alpha = 0.4)+
scale_fill_manual(values = c( "#FF6A00","#C15CCB", "#00868B"))
ggplot(penguins, aes(x = flipper_length_mm, fill = species)) +
geom_density(alpha = 0.4)+
scale_fill_manual(values = c( "#FF6A00","#C15CCB", "#00868B"))
flipper <- ggplot(penguins, aes(x = flipper_length_mm, fill = species, colour = species)) +
geom_density(alpha = 0.4)+
scale_fill_manual(values = c( "#FF6A00","#C15CCB", "#00868B"))+
scale_colour_manual(values = c( "#FF6A00","#C15CCB", "#00868B"))+
theme_bw(base_size = 20)+
theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())+
labs(x = "Flipper Length (mm)", y = "Density", fill = "Species", colour = "Species")+
#theme(legend.position = "none") +
scale_x_continuous(expand = c(0,0), limits = c(165, 238))+
scale_y_continuous(expand = c(0,0), limits = c(0, 0.065), breaks=seq(0, 0.065, 0.02))+
theme(axis.title.y = element_text(vjust = 3)) +
theme(axis.title.x = element_text(vjust = -1))
flipper
flipper_mean <- penguins %>%
group_by(species) %>%
summarise(mean = mean(flipper_length_mm, na.rm = TRUE))
density <- flipper +
geom_vline(data = flipper_mean, aes(xintercept = mean, colour = species), linetype = "dashed", size = 1.3)
density
ggplot(na.omit(penguins), aes(x=species, y=flipper_length_mm, fill=sex)) +
geom_boxplot()
box_plot<- ggplot(na.omit(penguins), aes(x=species, y=flipper_length_mm, fill=sex)) +
geom_boxplot()+
theme_bw(base_size = 20)+
theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())+
labs(x = "Species", y = "Flipper Length (mm)", fill = "Sex")+
theme(axis.title.y = element_text(vjust = 3)) +
theme(axis.title.x = element_text(vjust = -1)) +
theme(legend.position = c(0.15, 0.8))
box_plot
There are several different methods that allow you to arrange your plots into different panels.
ggarrangeHere we will arrange four of the plots that we made above into one figure using ggarrange. You can specify the number of rows nrow and ncol and add `labels.
Note we previously saved our figures (ellipse, density, bar, box_plot) in the Global Environment and we are calling on them here.
library(ggpubr)
ggarrange(ellipse, density, bar, box_plot,
nrow = 2, ncol = 2,
labels = c("a", "b", "c", "d"))
If two plots have the same axes, you can remove one and use align = "v" to ensure the plots remain the same size.
# Remove y axes text and title
lm <- lm +
theme(axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank())
ggarrange(ellipse, lm,
nrow = 1, ncol = 2,
labels = c("a", "b"),
align = "v")
patchworkpatchwork is a very straighforward and intuative package for arranging plots. e.g:
plot1 + plot2 aligns two plots next to each otherplot1 / plot2 aligns plot2 under plot1Lets make a fancy one.
We add the figure labels with e.g. labs(tag = 'a'). We will also make two of our plots half the size of the main plot with plot_layout(widths=c(2,1)).
library(patchwork)
(ellipse + labs(tag = 'a')| ((bar+ labs(tag = 'b')) / (box_plot +labs(tag = 'c')))) + plot_layout(widths=c(2,1))